North and South: Danish voters and Syrian refugees

Task 1: Get spatial data for municipalities in Denmark

By now you know how to download administrative data for Denmark from the GADM dataset using the geodata package. Remember to specify what level of admin boundaries you wish to download (0 = country, 1 = first-level subdivision / regions, 2 = second level aka municipalities, etc.). If you still have a functional raster package, you can read this blog on the power of raster package when it comes to available datasets.

Instructions:

  • Load the boundaries of Danish municipalities from data/ folder, convert to simple feature and transform to CRS 25832.
  • Check the NAME_2 field to see how the Danish municipalities are spelled. You may need to change them later for the spatial data to join the attributes.
# Load the spatial data, project to UTM
mun_sp <- readRDS("../data/gadm36_DNK_2_sp.rds")  # it's the gadm_... 2.rds dataset
mun_sf <- st_as_sf(mun_sp)
mun <- st_transform(mun_sf, crs = 25832)

# Plot so as to check correct location and complete coverage
plot(mun$geometry)

# Check the names
mun$NAME_2
 [1] "Albertslund"       "Allerød"           "Ballerup"         
 [4] "Bornholm"          "Brøndby"           "Christiansø"      
 [7] "Dragør"            "Egedal"            "Fredensborg"      
[10] "Frederiksberg"     "Frederikssund"     "Furesø"           
[13] "Gentofte"          "Gladsaxe"          "Glostrup"         
[16] "Gribskov"          "Halsnæs"           "Helsingør"        
[19] "Herlev"            "Hillerød"          "Høje Taastrup"    
[22] "Hørsholm"          "Hvidovre"          "Ishøj"            
[25] "København"         "Lyngby-Taarbæk"    "Rødovre"          
[28] "Rudersdal"         "Tårnby"            "Vallensbæk"       
[31] "Århus"             "Favrskov"          "Hedensted"        
[34] "Herning"           "Holstebro"         "Horsens"          
[37] "Ikast-Brande"      "Lemvig"            "Norddjurs"        
[40] "Odder"             "Randers"           "Ringkøbing-Skjern"
[43] "Samsø"             "Silkeborg"         "Skanderborg"      
[46] "Skive"             "Struer"            "Syddjurs"         
[49] "Viborg"            "Aalborg"           "Brønderslev"      
[52] "Frederikshavn"     "Hjørring"          "Jammerbugt"       
[55] "Læsø"              "Mariagerfjord"     "Morsø"            
[58] "Rebild"            "Thisted"           "Vesthimmerland"   
[61] "Faxe"              "Greve"             "Guldborgsund"     
[64] "Holbæk"            "Kalundborg"        "Køge"             
[67] "Lejre"             "Lolland"           "Næstved"          
[70] "Odsherred"         "Ringsted"          "Roskilde"         
[73] "Slagelse"          "Solrød"            "Sorø"             
 [ reached getOption("max.print") -- omitted 24 entries ]
# Straighten the names (return here after Task 2)

Task 2: Load voting data for 2011 and 2015 and inspect

Load the summarized voting data for 5 biggest parties in 2011 and 2015 and 2019 by municipality. The columns list total votes per party, sum of the electorate (people who qualify to vote), and fraction of vote that each party got in a given year.

  • Create the elections object from the data/elections.rds
  • Inspect it to ensure you understand what the columns contain.
  • Simpligy the mun geometry if it takes long to render.
  • Join the data to the municipality shapes mun by their shared name
  • Plot the Socialdemokrati fraction across Denmark in 2015 and check whether you got all the municipalities. Fix names if not.
  • In which municipalities did the Social Democrats get the highest proportion of population in 2015?
# Load the summarized election data
elections_data <- readRDS("../data/elections.rds")
elections_data
# A tibble: 495 × 11
# Groups:   Name, Party [495]
   Name        Party      Y2011 Y2015 Y2019 sum2011 sum2015 sum2019 pct_vote2011
 * <chr>       <chr>      <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>        <dbl>
 1 Albertslund A. Social…  4823  4836  4464   10935   10027    9780        44.1 
 2 Albertslund C. Det Ko…   547   339   620   10935   10027    9780         5.00
 3 Albertslund F. SF - S…  2148  1098  2092   10935   10027    9780        19.6 
 4 Albertslund O. Dansk …  1984  3116  1225   10935   10027    9780        18.1 
 5 Albertslund B. Radika…  1433   638  1379   10935   10027    9780        13.1 
 6 Allerød     B. Radika…  2181  1287  2359    8441    8960    9896        25.8 
 7 Allerød     A. Social…  2414  3450  3079    8441    8960    9896        28.6 
 8 Allerød     F. SF - S…  1156   664  1171    8441    8960    9896        13.7 
 9 Allerød     C. Det Ko…  1362  1171  2380    8441    8960    9896        16.1 
10 Allerød     O. Dansk …  1328  2388   907    8441    8960    9896        15.7 
# ℹ 485 more rows
# ℹ 2 more variables: pct_vote2015 <dbl>, pct_vote2019 <dbl>
sort(unique(elections_data$Name))
 [1] "Aabenraa"          "Aalborg"           "Aarhus"           
 [4] "Ærø"               "Albertslund"       "Allerød"          
 [7] "Assens"            "Ballerup"          "Billund"          
[10] "Bornholm"          "Brøndby"           "Brønderslev"      
[13] "Christiansø"       "Dragør"            "Egedal"           
[16] "Esbjerg"           "Faaborg-Midtfyn"   "Fanø"             
[19] "Favrskov"          "Faxe"              "Fredensborg"      
[22] "Fredericia"        "Frederiksberg"     "Frederikshavn"    
[25] "Frederikssund"     "Furesø"            "Gentofte"         
[28] "Gladsaxe"          "Glostrup"          "Greve"            
[31] "Gribskov"          "Guldborgsund"      "Haderslev"        
[34] "Halsnæs"           "Hedensted"         "Helsingør"        
[37] "Herlev"            "Herning"           "Hillerød"         
[40] "Hjørring"          "Høje-Taastrup"     "Holbæk"           
[43] "Holstebro"         "Horsens"           "Hørsholm"         
[46] "Hvidovre"          "Ikast-Brande"      "Ishøj"            
[49] "Jammerbugt"        "Kalundborg"        "Kerteminde"       
[52] "København"         "Køge"              "Kolding"          
[55] "Læsø"              "Langeland"         "Lejre"            
[58] "Lemvig"            "Lolland"           "Lyngby-Taarbæk"   
[61] "Mariagerfjord"     "Middelfart"        "Morsø"            
[64] "Næstved"           "Norddjurs"         "Nordfyns"         
[67] "Nyborg"            "Odder"             "Odense"           
[70] "Odsherred"         "Randers"           "Rebild"           
[73] "Ringkøbing-Skjern" "Ringsted"          "Rødovre"          
 [ reached getOption("max.print") -- omitted 24 entries ]
# SImplify Danish municipalities geometry

mun <- mun %>% 
  st_simplify(dTolerance = 100)

# Check that borders still touch
mun %>% 
  mapview()
# Join the election data with municipality polygons
mun %>% 
  full_join(elections_data, join_by("NAME_2" == "Name")) %>% 
  dplyr::filter(grepl("^A", Party)) %>% 
  mapview(zcol = "pct_vote2011")
?full_join()


# Fix the missing counties : Aarhus, Vesthimmerland, Høje Taastrup
mun$NAME_2[31] <- "Aarhus"
mun$NAME_2[21] <- "Høje-Taastrup"
mun$NAME_2[60] <- "Vesthimmerlands"

# Join for real with good municipality names
elections <- mun %>% 
  full_join(elections_data, join_by("NAME_2" == "Name"))

# Map fraction of Socialdemokratie in 2015 to see no counties are missing


elections %>% 
  filter(grepl("^A", Party)) %>%  # A.Socialdemokratie
  dplyr::select("pct_vote2015") %>% 
  mapview()
# Which municipalities are the biggest fans of Socialdemokratie?
elections %>% 
  st_drop_geometry() %>% 
  dplyr::select(NAME_2, Party, pct_vote2011, pct_vote2015, pct_vote2019) %>% 
  dplyr::filter(grepl("^A", Party)) %>% 
  arrange(desc(pct_vote2015))
        NAME_2                Party pct_vote2011 pct_vote2015 pct_vote2019
1     Bornholm A. Socialdemokratiet     58.14078     56.26584     63.12960
2        Morsø A. Socialdemokratiet     55.10737     55.82588     65.61254
3  Christiansø A. Socialdemokratiet     34.21053     52.38095     62.50000
4      Aalborg A. Socialdemokratiet     47.49788     52.11449     56.72821
5      Thisted A. Socialdemokratiet     48.33047     52.04463     58.69840
6     Ballerup A. Socialdemokratiet     48.03978     51.70085     50.26438
7        Odder A. Socialdemokratiet     45.45560     51.65284     45.98328
8       Nyborg A. Socialdemokratiet     47.26519     51.07839     57.47772
9        Skive A. Socialdemokratiet     48.47059     50.66342     54.22290
10     Lolland A. Socialdemokratiet     43.20034     50.63828     61.18079
11      Herlev A. Socialdemokratiet     45.26093     50.10718     49.60733
12     Randers A. Socialdemokratiet     54.64624     49.90296     55.69036
13     Næstved A. Socialdemokratiet     44.27494     49.89141     54.34878
14  Jammerbugt A. Socialdemokratiet     50.08708     49.70149     60.31484
15     Rødovre A. Socialdemokratiet     45.55709     49.68578     47.92825
 [ reached 'max' / getOption("max.print") -- omitted 84 rows ]

Task 3: Look at some of the data

Now that we have a well-structured, complete and spatial dataset, let’s explore the political preference distribution in space with the help of the lovely tmap library!

  • Filter your elections data for Social Democrats and Danske Folkeparti (Hint: grepl() is a good start)
  • then feed the result into tm_shape() and tm_polygons, faceting along the way by party. Since you have 2 parties, you should get two visuals.
  • repeat three times, changing the tm_polygons() data from pct_vote2011 to pct_vote2019
# Let's map the two most popular parties, SD and Danske Folkeparti through time
library(tmap)
elections %>%
    filter(grepl("^A|^O", Party)) %>%
    tm_shape() + tm_facets("Party", ncol = 2) + tm_polygons("pct_vote2011", fill.legend = tm_legend("Percentage of Votes \nin 2011",
    orientation = "portrait", position = tm_pos_out("right", "center")))

elections %>%
    filter(grepl("^A|^O", Party)) %>%
    tm_shape() + tm_facets("Party") + tm_polygons("pct_vote2015", fill.legend = tm_legend("Percentage of Votes \nin 2015",
    orientation = "portrait", position = tm_pos_out("right", "center")))

elections %>%
    filter(grepl("^A|^O", Party)) %>%
    tm_shape() + tm_facets("Party") + tm_polygons("pct_vote2019", fill.legend = tm_legend("Percentage of Votes \nin 2019",
    orientation = "portrait", position = tm_pos_out("right", "center")))

Task 4: Cartogram

As you can see from the maps, the area of municipalities varies considerably. When mapping them, the large areas carry more visual “weight” than small areas, although just as many people or more people live in the small areas. Voters in low-density rural regions can thus visually outweigh the urban hi-density populations.

One technique for correcting for this is the cartogram. This is a controlled distortion of the regions, expanding some and contracting others, so that the area of each region is proportional to a desired quantity, such as the population. The cartogram also tries to maintain the correct geography as much as possible, by keeping regions in roughly the same place relative to each other.

The cartogram package contains functions for creating cartograms. You give it a spatial data frame and the name of a column, and you get back a similar data frame but with regions distorted so that the region area is proportional to the column value of the regions.

You’ll also use the sf package for computing the areas of newly generated regions with the st_area() function.

Instructions

The elections sf object should be already loaded in your environment.

  • Load the cartogram package.
  • Filter out the Danske Folkeparti votes from your elections dataset, creating a DF object
  • Plot total electorate per municipality area for year 2015 in the DF data. Deviation from a straight line shows the degree of misrepresentation.
  • Create a cartogram scaling to the pct_vote2015 column with cartogram_cont() function.
  • Check that the DF voter population is proportional to the area.
  • Plot the pct_vote2015 percentage on the cartogram. Notice how some areas have relatively shrunk or grown.
# load library
library(cartogram)

# Filter out Danske Folkeparti
DF <- elections %>%
    filter(grepl("^O", Party))

# Check the spread of votes and municipality area
plot(DF$pct_vote2015, st_area(DF, byid = TRUE)/1e+06, xlab = "Vote %", ylab = "Area (km2)",
    main = "Dansk Folkeparti fraction per municipality area")

# Make a cartogram, scaling the area to the percentage of SD voters
DF2015 <- cartogram_cont(DF, "pct_vote2015")

# Check the linearity of the SD voters percentage per municipality plot
plot(DF2015$pct_vote2015, st_area(DF2015, byid = TRUE))

plot(DF2015$geometry)

Copacetic cartogram! Now try to rerun the cartogram for the Social Democrats in 2015 and create a visual for both parties’ turnout and total electorate in 2015.

library(cartogram)
# Let's look at Social Democrats in 2015
SD <- elections %>%
    filter(grepl("^A", Party))

# Make a cartogram, scaling the area to the total number of votes cast in 2015
SD2015 <- cartogram_cont(SD, "pct_vote2015")

# Now check the linearity of the total voters per municipality cartogram as
# opposed to the reality
plot(SD$sum2015, st_area(SD, byid = TRUE))  # reality

plot(SD2015$sum2015, st_area(SD2015, byid = TRUE))  # cartogram

# Make a adjusted map of the 2015 SD and DF voters
plot(SD2015$geometry, col = "pink", main = "% of Social Democrat votes across DK in 2015")

plot(DF2015$geometry, col = "lightblue", main = "% of Danske Folkeparti votes across DK in 2015")

Task 5: Spatial autocorrelation test

If we look at the facetted tmaps the election results in 2015 seem to have spatial correlation - specifically the percentage of voters favoring Danske Folkeparti increases as you move towards the German border. This trend is not as visible in the cartogram, where the growth is more apparent in Sjæland, and other islands, like Samsø.

How much similarity and spatial dependence is there, really?

By similarity or positive correlation, we mean : pick any two kommunes that are neighbors - with a shared border - and their attributes will be more similar than any two random municipalities. Such autocorrelation or spatial dependence can be a problem when using statistical models that assume, conditional on the model, that the data points are independent.

The spdep package has functions for measures of spatial autocorrelation, also known as spatial dependency. Computing these measures first requires you to work out which regions are neighbors via the poly2nb() function, short for “polygons to neighbors”. The result is an object of class nb. Then you can compute the test statistic and run a significance test on the null hypothesis of no spatial correlation. The significance test can either be done by Monte-Carlo or theoretical models.

In this example you’ll use the Moran “I” statistic to test the spatial correlation of the Danske Folkeparti voters in 2015.

Instructions I - defining neighbors

  • Load the elections spatial dataset with attributes
  • Consider simplifying the boundaries if the data is too heavy for your computer and takes long to visualise
  • Load the spdep library and create nb object of neighbors using queen adjacency
  • Pass elections to poly2nb() to find the neighbors of each municipality polygon. Assign to nb.
  • Get the center points of each municipality by passing elections to st_centroid and then to st_coordinates(). Assign to mun_centers.
  • Update the basic map of the DK municipalities by adding the connections.
    • In the second plot call pass nb and mun_centers.
    • Also pass add = TRUE to add to the existing plot rather than starting a new one.
# Reload the data if needed elections
plot(elections$geometry)

# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 100), to = "MULTIPOLYGON")
mapview(mun_sm)
length(st_is_valid(mun_sm$geometry))
[1] 99
# Use the spdep package
library(spdep)

# Make neighbor list following queen adjacency
`?`(poly2nb)
nb <- poly2nb(mun_sm$geometry)
nb
Neighbour list object:
Number of regions: 99 
Number of nonzero links: 352 
Percentage nonzero weights: 3.59147 
Average number of links: 3.555556 
11 regions with no links:
4 6 10 43 55 57 63 68 79 84 89
16 disjoint connected subgraphs
# Get center points of each municipality
mun_centers <- st_coordinates(st_centroid(mun_sm$geometry))

# Show the connections
plot(mun_sm$geometry)
plot(nb, mun_centers, col = "red", add = TRUE)

Instructions II - Moran’s I

Now that your neighbors are determined and centroids are computed, let’s continue with the Moran’s I statistic

  • Create a subset with municipalities for O.Danske Folkeparti
  • Feed the pct_2011 vector into moran.test().
    • moran.test() needs a weighted version of the nb object which you get by calling nb2listw().
    • After you specify your neighbor nb object you should define the weights style = "W". Here, style = "W" indicates that the weights for each spatial unit are standardized to sum to 1 (this is known as row standardization). For example, municipality 1 has 3 neighbors, and each of those neighbors will have weights of 1/3. This allows for comparability between areas with different numbers of neighbors.
    • You will need another argument in both spatial weights and at the level of the test. zero.policy= TRUE deals with situations when an area has no neighbors based on your definition of neighbor (many islands in Denmark). When this happens and you don’t include zero.policy= TRUE, you’ll get an error.
    • Run the test against the theoretical distribution of Moran’s I statistic. Find the p-value. Can you reject the null hypothesis of no spatial correlation?
  • Inspect a map of pct_2011.
  • Run another Moran I statistic test, this time on Social Democrats.
    • Use 999 Monte-Carlo iterations via moran.mc().
    • The first two arguments are the same as for moran.test().
    • You also need to pass the argument nsim = 999.
    • Note the p-value. Can you reject the null hypothesis this time?
# Let's look at Danske Folkeparti
DF
Simple feature collection with 99 features and 23 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 441745.6 ymin: 6049781 xmax: 892801.1 ymax: 6402207
Projected CRS: ETRS89 / UTM zone 32N
First 10 features:
  GID_0  NAME_0   GID_1      NAME_1 NL_NAME_1     GID_2      NAME_2 VARNAME_2
1   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.1_1 Albertslund      <NA>
2   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.2_1     Allerød      <NA>
3   DNK Denmark DNK.1_1 Hovedstaden      <NA> DNK.1.3_1    Ballerup      <NA>
  NL_NAME_2  TYPE_2    ENGTYPE_2 CC_2   HASC_2               Party Y2011 Y2015
1      <NA> Kommune Municipality <NA> DK.HS.AB O. Dansk Folkeparti  1984  3116
2      <NA> Kommune Municipality <NA> DK.HS.AL O. Dansk Folkeparti  1328  2388
3      <NA> Kommune Municipality <NA> DK.HS.BA O. Dansk Folkeparti  4295  6767
  Y2019 sum2011 sum2015 sum2019 pct_vote2011 pct_vote2015 pct_vote2019
1  1225   10935   10027    9780     18.14358     31.07609    12.525562
2   907    8441    8960    9896     15.73273     26.65179     9.165319
3  2898   20712   20284   19101     20.73677     33.36127    15.171981
                        geometry
1 POLYGON ((712057 6173414, 7...
2 POLYGON ((700891 6191571, 7...
3 POLYGON ((715156 6178972, 7...
 [ reached 'max' / getOption("max.print") -- omitted 7 rows ]
# Run a Moran I test test on 2015 DF vote
moran.test(DF$pct_vote2015, nb2listw(nb, style = "W", zero.policy = TRUE), zero.policy = TRUE)

    Moran I test under randomisation

data:  DF$pct_vote2015  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 6.6846, p-value = 1.158e-11
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.529642335      -0.011494253       0.006553402 
# Run a Moran I test test on 2011 DF vote
moran.test(DF$pct_vote2011, nb2listw(nb, style = "W", zero.policy = TRUE), zero.policy = TRUE)

    Moran I test under randomisation

data:  DF$pct_vote2011  
weights: nb2listw(nb, style = "W", zero.policy = TRUE)  
n reduced by no-neighbour observations  

Moran I statistic standard deviate = 6.2103, p-value = 2.645e-10
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance 
      0.492919418      -0.011494253       0.006597096 
# Do a Monte Carlo simulation to get a more reliable p-value
moran.mc(DF$pct_vote2011, nb2listw(nb, zero.policy = TRUE), zero.policy = TRUE, nsim = 999)

    Monte-Carlo simulation of Moran I

data:  DF$pct_vote2011 
weights: nb2listw(nb, zero.policy = TRUE)  
number of simulations + 1: 1000 

statistic = 0.49292, observed rank = 1000, p-value = 0.001
alternative hypothesis: greater

Marvelous Moran Testing! When I ran the examples, the p-value was around 1.584e-11 in 2015 and 2.685e-10 in 2011 Moran tests, showing significant spatial correlation. In Monte Carlo simulation, the p-value was around 0.001, so I did find some significant spatial (positive) correlation.

Repeat the same test for Social Democrats

# Let's look at Social Democrats

# Run a Moran I test on percentage of SD turnout in 2011

# Run a Moran I test on percent of SD turnout in 2015

# Do a Monte Carlo simulation to get a better p-value

# Do a Monte Carlo simulation to get a better p-value

Phenomenal political testing. Social Democrats also show positive correlation. p-value in Moran I test is was around 8.044e-09 in 2011 results and 2.654e-07 in 2015 results, and thus significant result. In Monte Carlo simulation, the p-value was around 0.24, suggesting there is significant (positive) spatial correlation.

Task 6: Different sorts of neighborhood: 50 km

Does the result hold if you use a different scale / neighborhood calculation?

Connect the nearest places (islands)

# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)

# Use the spdep package
library(spdep)

# Get center points of each municipality
mun_centers <- st_centroid(mun_sm$geometry, of_largest_polygon = TRUE)

# Make neighbor list from neighbours at 100km distance
nb_100 <- dnearneigh(mun_centers, 0, 1e+05)
plot(mun_sm$geometry)
plot(nb_100, mun_centers, col = "red", add = TRUE)

# Make neighbor list from neighbours at 50km distance
nb_50 <- dnearneigh(mun_centers, 0, 50000)
plot(mun_sm$geometry)
plot(nb_50, mun_centers, col = "blue", add = TRUE)
title(main = "Neighbours within 50 km distance")

Task 7: Different sorts of neighbourhood: k neighbors

# Consider simplifying (don't go too high)
mun_sm <- st_cast(st_simplify(mun, dTolerance = 250), to = "MULTIPOLYGON")
plot(mun_sm$geometry)

# Use the spdep package
library(spdep)

# Get center points of each municipality
mun_centers <- st_centroid(mun_sm$geometry, of_largest_polygon = TRUE)

# Make neighbor list from 3 nearest neighbours
k3 <- knearneigh(mun_centers, k = 3)
nb_k3 <- knn2nb(knearneigh(mun_centers, k = 3))

plot(mun_sm$geometry)
plot(nb_k3, mun_centers, col = "red", add = TRUE)
title(main = "3 nearest neighbours")

# Make neighbor list from 3 nearest neighbours
nb_k5 <- knn2nb(knearneigh(mun_centers, k = 5))
plot(mun_sm$geometry)
plot(nb_k5, mun_centers, col = "red", add = TRUE)
title(main = "5 nearest neighbours")

Task 8: Rerun Moran’s I and MC

Now let’s rerun Moran’s I and MC with different neighbour conceptions

# Run a Moran I test on Dansk Folkeparti votes in 2015 based on k neighbors 
moran.test(DF$_____,
           ________(nb_k3, style = "W",zero.policy=TRUE),
           zero.policy=TRUE)

# Do a Monte Carlo simulation to get a better p-value
moran.mc(DF$_____,
           ________(nb_k3, s zero.policy=TRUE),
         zero.policy=TRUE, nsim = 999)

# Run a Moran I test on Dansk Folkeparti votes in 2011 based on k neighbors
moran.test(DF$_____,
           ________(nb_k3, style = "W",zero.policy=TRUE),
           zero.policy=TRUE)

# Do a Monte Carlo simulation to get a better p-value
moran.mc(DF$___________,
         ________(nb_k3, zero.policy=TRUE),
         zero.policy=TRUE, nsim = 999)